Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

نویسندگان

Rong Ge

Furong Huang

Chi Jin

Yang Yuan

چکیده

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an arbitrary starting point, stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Power of Normalization: Faster Evasion of Saddle Points

A commonly used heuristic in non-convex optimization is Normalized Gradient Descent (NGD) a variant of gradient descent in which only the direction of the gradient is taken into account and its magnitude ignored. We analyze this heuristic and show that with carefully chosen parameters and noise injection, this method can provably evade saddle points. We establish the convergence of NGD to a loc...

متن کامل

Escaping Saddles with Stochastic Gradients

We analyze the variance of stochastic gradients along negative curvature directions in certain nonconvex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that contrary to the case of isotropic noise this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensiona...

متن کامل

Discovery of Latent Factors in High-dimensional Data Using Tensor Methods

OF THE DISSERTATIONDiscovery of Latent Factors in High-dimensional Data Using Tensor MethodsByFurong HuangDoctor of Philosophy in Electrical and Computer EngineeringUniversity of California, Irvine, 2016Assistant Professor Animashree Anandkumar, Chair Unsupervised learning aims at the discovery of hidden structure that drives the observationsin the real world. It is ...

متن کامل

Stabilizing Adversarial Nets with Prediction Methods

Adversarial neural networks solve many important problems in data science, but are notoriously difficult to train. These difficulties come from the fact that optimal weights for adversarial nets correspond to saddle points, and not minimizers, of the loss function. The alternating stochastic gradient methods typically used for such problems do not reliably converge to saddle points, and when co...

متن کامل